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<N Abstract 

We compute a variance lower bound for unbiased estimators in specified statis- 
tical models. The construction of the bound is related to the original Cramer- Rao 
bound, although it does not require the differentiability of the model. Moreover, 
we show our efficiency bound to be always greater than the Cramer-Rao bound in 
^ smooth models, thus providing a sharper result. 

P 1 Introduction 

Efficiency theory aims to establish an objective criterion to judge if an estimator 

is the best possible in a given class. The most famous example is without doubt the 

Cramer-Rao inequality, which states in its simpler form that the variance of an unbiased 

estimator in a parametric model is not smaller than the inverse of the Fisher information. 

The inequality was originally stated in [RR45] and has been the foundation of a numerous 

efficiency theories developped in the literature, such as that due to Le Cam and Hajek (see 

Haj70], [LC60| ) that extend the Cramer- Rao inequality to larger models with alternative 

F^- regularity assumptions. We refer to [BKRW98] and |vdV98j for a survey. 

CN 

In this paper, we introduce a variance lower bound for unbiased estimators in a sta- 
^ tistical model. The construction of the bound relies on the same idea as the original 

t-h Cramer-Rao bound, although no regularity conditions of any kind are needed. The ad- 

>• vantage of our approach is threefold. First, an efficiency bound can be computed without 

differentiability conditions on the model nor on the parameter to estimate. Second, the 
bound is adapted to all types of models: parametric, semiparametric or nonparamet- 
ric. Finally, the efficiency bound is always greater or equal to the Cramer-Rao bound 
(whenever it is well denned) and thus is more informative. 

The paper is organized as follows. We define our efficiency bound in Section [2] and 
we compare its performance to the Cramer-Rao bound in different iable parametric mod- 



els. We discuss the generalization to semiparametric models in Section 2.2 and provide 



an asymptotic analysis in Section |2.3[ The proofs of our results are postponed to the 
Appendix. 
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2 Construction of the efficiency bound 



Let (X, B(X)) be an open subset of W endowed with its borel field, we denote by V(X) 
the set of all probability measures on (X,B(X)). We consider the classical statistical 
model where we observe an i.i.d. sample X\, ...X n drawn from an unknown measure /i 
and we wish to estimate a parameter tp. 

The construction of an efficiency bound relies only on two aspects which are the model 
and the parameter to estimate. The model is defined as the set of possible values for the 
measure /x. We shall assume in the sequel that the model is well chosen so that /i G V. A 
parameter ip is to be understood as a map ip : V — > H. In this paper, we restrict to finite 
dimensional parameters, with H a subset of 1R P . 

We define the quadratic divergence (or Q-divergence) between two probability mea- 
sures /i and v on (X, B) as 

d(fi, v) = I ^1 — dfi if v <C /i, d(fi, v) = +00 otherwise. 

The Q-divergence is Csiszar's /-divergence associated to the convex function / : x 1— > 
(1 — x) 2 (see [Csi67j). Remark that the Q-divergence between two probability measures \x 
and v is not symmetric, so we shall speak of quadratic divergence of v with respect to fi to 
avoid confusion. Moreover, let A be a subset of V(X), we define <i(/i, A) = inf^g^ d(fi, v). 
Any measure /i* G A such that <i(/i, //) = d(/i, A) is called Q-projection of /1 onto A. 



2.1 Main result 

In the next theorem we show that to each element of a model, can be associated a 
variance lower bound for an unbiased estimator of a parameter ipi^) We use the 

convention l/oo = 0. 

Theorem 2.1 Let V be a model and ip : V — > H a parameter. IfT = T(X\, ...,X n ) is an 
unbiased estimator of ip in the model V , then W G V \ {^}: 

{ ) ~ {d^ + iy-i • 

Whenever ip takes values in W 1 with q > 1, the inequality is meant in the sense of the 
quadratic forms, i.e. A > B if and only if A — B is positive semi-definite. Observe that 
this result does not require any regularity conditions on the model. For instance, it is 
not needed that v be absolutely continuous w.r.t. /1, although in this case the efficiency 
bound is null and provides no information. 



2 



Let H^(fi, .) denote the functional denned on V* = V \ {fi} by 

{^)-^)){^)-^{v)) t 



m(ji,u) = n- 



The quantity i/) provides a lower bound for n times the variance of an unbiased 

estimator of ip. Since H^(fi, v) is null if v is not absolutely continuous w.r.t. fj, or if dvjd\x 
is not square /i-integrable, sufficient is to consider the values of H^(p,, v) for density 
measures v = f n with / in T = {/ : fjj, G V, J f 2 d\i < oo}. The main advantage is that 
T being a subspace of L 2 (/i), it can be endowed with its natural Hilbert space topology. 



The result of Theorem 2. 1| gives us all the more information that the right term of the 



inequality is large. In the case q > 1, the correct way to interpret this result is to consider 
real valued linear transformations of tp, where the result can be stated in the form 

Va G W, n var(a*T) > a* H$(ji,u) a. 

Thus, because the case q > 1 can be treated by considering real valued parameters, we 
shall assume for simplicity that ip takes values in R, and therefore, H^(/i, v) G [0; +oo]. 

We define the efficiency bound for estimating if) in V as the supremum over the whole 
model 

B;(V) := sup H^v) = sup H$(p,fvL). 
ver* feT\{i} 

Let be a subset of M. d and {^e}eee a collection of probability measures on (X, B(X)) 
with fi0 o = fi. We say that {fJ J e}eee is differentiable in L 2 (/i) at ^o, if there exists a map 
g : X — » IR d such that J x g t g d\x < oo and such that for all a G M. d , 



lim 

t— 5-0 



x 



1 / dflg 0+ta ' ^ 



t V d// 



xj — i I — a ^(^xj 



= 0. 



The function (7 is called the score function of the model {^e}e at 9 = 8q, while the matrix 
X = j x gg l dfx is the Fisher Information. The score can be seen as a Frechet differential of 
the model {fig} in the L 2 (/i) sense. More usual definitions of the score generally require 
the model to be differentiable in an almost-sure sense, which is stronger than the condition 
above. Remark however that, while the differentiability in L 2 (/z) is necessary for the sake 
of this paper, it is less general than the differentiability in quadratic mean, discussed for 
instance in |vdV02j . 

Let ip : {fJ>e}e — > H be a parameter such that the map 9 y ip(fig) is differentiable at 
6*o (we note ip(9o) G M. dxq its derivative matrix), the Cramer-Rao inequality states that if 
T = T(Xi, ...,X n ) is an unbiased estimator of ip, then 

n var(T) > ^(0„)* X' 1 ^(9 ). 
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When the model V is differentiable, the Cramer-Rao bound B^(V) = ip(9 o y X -1 -^(^o) 
provides a variance lower bound for unbiased estimators of if) in differentiable models. 
We shall now see in the next proposition the comparison with our efficiency bound B^ in 
smooth models. 

Proposition 2.2 Let {/lejeee be a differentiable path with fig = fi. Let ip : {fig}g — > K be 
a map such that 9 h-> if}(fig) is differentiable at 9 . Then, B^{{jig}g) = li m 0->0 o H^(fJ,,fig) 
for all n e N. In particular, 



The efficiency bound 5^ improves on the Cramer-Rao bound since it is defined as the 
supremum of v \-t H^(fi, .) on the model, while B^ is the limit at v — >■ fi. As a result, in 
differentiable models, the functional H^(/i, .) can be extended by continuity at fi taking 
the value H^(/i, ji) = B^. In some situations, the two bounds are identical (i.e. the max- 
imum of H^(fi, v) is reached as v — > fi), for example as soon as the Cramer- Rao bound 
can be reached for finite samples. On the other hand, it is not rare to have the strict 
inequality B^({fi e }g) > B 1 p({fig}g), as we show in the following examples. 

Example 1 (Gaussian model). Consider the Gaussian model {fj,g}g^., where fig ~ 
Af(9, 1) and let if} : fig \- > e e . We take fi ~ A/"(0, 1) as the distribution of the observations. 
In this model, the Cramer-Rao bound is B^ = 1. On the other hand, we have d(fi, fig) = 
e 6 — 1, yielding 



for 9 E (—1; +oo). The supremum is reached for 9 n = \, which gives B^ = n(e 1 / n — I). 
Thus, we observe a strict inequality B^ > B^ for all n G N. In this case, it is interesting 
to notice that B 1 ^ is the actual variance of the optimal unbiased estimator of e e in this 
model. 

Example 2 (exponentiel model). Consider the model {fig}g>o, where fig is an exponen- 
tiel distribution with parameter 9, i.e. dfig(x) = 9e~ 9x l{x > 0}dx. We want to estimate 
the parameter if> : fig i— >■ 9, the true value of the parameter being 9 = 1. Calculation of 
the Cramer-Rao bound gives B^ = 1. On the other hand, the Q-divergence of fig w.r.t. 
fi is 



B;({flg}g) > B4{flg}g). 



H%(fi,fig) = n 




e n92 _ ! ' 



d(fi,fig) 



29-1 



1, for 9 > - and d(fi, fig) = +oo otherwise. 



It follows that 



Hl^fig) 



(9 — 1) 2 (29 — iy 
Q2n _ (29 - 1)™ 



n 



t{9 > 1/2}. 
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Figure 1: Plot of 9 H> fx e ) for n = 4 to 15. 



The curves are decreasing as n grows (the curve on the top represents for n = 4 
while the lowest curve is for n = 15). The functions are not defined at — 1 but they 
can be extended by continuity taking the value H^(/i, fj,) — — 1 for all n G N. This 
corresponds on the graph to the intersection point of all the curves. We observe that for 
all n G N, the supremum is larger than the Cramer- Rao bound B^ = 1. 



2.2 Application to semiparametric models 

Extending the Cramer-Rao inequality to semiparametric models can be made using 
a more general definition of the Fisher Information, calculated by studying differentiable 
submodels. Based on the idea that, the larger the model, the less information we have, 
a natural definition of the Fisher Information in large models is to consider the infimum 
of the Fisher Informations calculated in differentiable submodels (see for instance []). A 
least favorable path is a differentiable submodel {/^jeee for which the infimum is reached, 
and therefore, such that B^(V) = B^({(io}g). 

The functional H^(fi, .) turns out to be an efficient tool to construct a least favorable 
path. To see it, consider the level sets Te — {/ G L 2 (/i) : ip(ffi) = 0} for all values 6 
taken by the parameter ip. Setting 9o = tp(fj), the expression of the efficiency bound can 
be written as 



(0 - e y 

o&o feT e v e^e {a{n, Fe) + l) n - 1 



B4, = SU P SU P H ^(n, ffi) = sup n fJf ^ - A -. (1) 



In these settings, we see that calculating the efficiency bound is reduced to maximizing 
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a function of 9. The idea is that if we choose the least favorable density in each set Tg, 
that is, a function f e maximizing / !->■ H^(fi,ffi), the resulting submodel would have 
to be a least favorable path (if a least favorable measure can not be reached, we may 
consider a proper collection of densities arbitrarily close to the least favorable measure 
in each set J-g, leading to a collection of submodels). Since by construction, the term 
ip{f[i) — ip(fi) is constant when / ranges over J r g, a density maximizing .) on Tg is 

in fact a minimizer of / i— > d(fx, ///), which explains the term d(fi, J-g) in (lip. 

Definition We call quadratic projection path (or Q-projection path) a submodel {^e}eee 
such that d(fx, fig) = d(n,J-"g) and ip(fJ^g) = 9 for all 9 G 9. 

A Q-projection path does not necessarily exist, for instance if the infimum of d(fi, .) 
on J-g is not reachable for some values of 9. However, a Q-projection path does exist as 
soon as the map / i— > ip(ffi) is continuous on J 7 and if d(fi,J 7 g) is finite for all 9 G 0. 
By making this continuity assumption, we avoid considering trivial cases, the efficiency 
bound being infinite if / \-> ip(ffi) is not continuous as / tends to 1. 

If the sets Tg are convex, a Q-projection path {/ie}eee is unique, \ig being defined as 
the quadratic projection of // on Vg = {u G V : if)(v) = 9}. A Q-projection path does not 
depend on the number of observations, although it contains a maximizer of H2(fi, ■) for 
all n G N. In a certain way, it contains the whole information of the model. 

As a straightforward consequence of ([I]), a Q-projection path {fig}g satisfies B^(V) = 
B^({fj,g}g) for all n G N. Moreover, remark that a Q-projection path is a least favorable 
path if and only if it is differentiable at \i = fig . These remarks are illustrated in the 
following examples. 

Example 3 (moment condition model). Let V = \v G V(X) : § x <&dv = 0} for $ : 
X — > ]R fc a known map. We want to estimate 9q = J hd\x G ffi. where h G L 2 (/i) is a 
given function. For all 9 G R, J-g is an affine subspace of L 2 (/z) of finite dimension, it 
is therefore closed and convex. Hence, there exists a unique Q-projection path {fig}g, 
with densities fg w.r.t. fi. Note /r 1 the part of h orthogonal with $ in L 2 (/z): h x = 
h- (J h<f>dfj,y[f we have: 

f e = argmin E(l - /(X)) 2 = 1 - (9 - 9)V- 1 (h ± - 9) 

(9 -9) 2 V-\ yielding 
V. 

Note that the model {/iejeeiR is smooth, with Cramer- Rao bound = B^ = V for all 
integer n. 



with V = var(/i ± (X)). Moreover, d(/i,/i e ) = E(l - fg{X) f = 

R n_ n(9 - 9f 

+ " eZ ((9 - 9fV-^ + 1)» - 1 ~ 
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Example 4 (empirical likelihood). Assume that the true measure \i satisfies the con- 
straint J $>0 o d[i = for some known collection of maps {$51 : 9 G 0} and where 9q is the 
parameter we intend to estimate. The sets Tq = {/ G L 2 (/i) : J &gfd[i = 0} are closed 
and convex. Note {fie}e the Q-projection path with densities fg given by 

fe = argmin E(l - /(X)) 2 = 1 - (/ $ du)* [var($*pO)] _1 (®e ~ J $edfi) . 

If we assume that 9 i-> $0 is differentiable in a neighbourhood of #0, with derivative V <&(.), 
the path {iio}e is also differentiable and we have 

d(ji,ne) = (J Qedfi)* [var($ e (X))] _1 (/ $ g dfjt) , 

yielding 



Bl = sup 



^ e^e (d(n,fj, e ) + l) n - 1 
We recover the asymptotic efficiency bound of QL94| in this model. 



2.3 Asymptotic properties 

We are now interested in the asymptotic analysis of the efficiency bound. Writing the 
first order expansion 

u) + l) n - 1 = n d{jL, v) + w(n ~ 1} d(fi, vf + ... 

we see that the sequence {H2(fJ>, -)}neN is decreasing and converges pointwise toward 
as n — > 00. So, the non negative sequence {B^} n&i is also decreasing and therefore, it 
converges (or is infinite). We now aim to prove that, in regular situations, the efficiency 
bound converges toward the Cramer-Rao bound. 

Lemma 2.3 Assume that B'T < 00 for some n 6 N. Then, for all e > 0, H^(fi, .) 
converges uniformly towards on the set {i/G?: d(fi, v) > e} as n — >■ 00. 

The condition that is finite for some integer no is necessary to ensure the existence 
of an unbiased estimator with finite variance, even asymptotically. However, it may occur 
that this condition is not fulfilled while the Cramer-Rao bound exists and is finite. 



An interpretation of Lemma |2.3 is that for all element v of the model with a non 



zero distance with /x (so basically any v G V*), the increasing number of observations 
will eventually end up giving too much information so that the true distribution can 
not be mistaken with v. Thus, only the behaviour of the measures of the model in the 
neighborhood of \x matters asymptotically. As a result, a measure v G V far from \x will 
no longer have any influence on the variance of an estimator as soon as the number of 
observations is large enough. 
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Theorem 2.4 Assume that B^°(V) < oo for some no G N. If there exists a Q-projection 
path {fig}e differentiable at jj,, then 

n— >oo T 

This result is not surprising as we know that the efficiency bound only depends asymp- 
totically on the behaviour of the model in the neighborhood of \i. Remark that the con- 



vergence is pointed out in the examples 1 and 2 in Section 2.1 We emphasize that, in 
a parametric model {fig}g, the efficiency bound has a positive limit B^j° in non-trivial 
cases as soon as the map 9 h-> \/d(fi, is differentiable at 9 , while the construction of 
the Cramer- Rao bound requires the much stronger condition of differentiability in h 2 (n). 
Thus, the efficiency bound B^ is computable in a larger class of models, while providing 
at least as good an asymptotic analysis as the Cramer-Rao inequality in smooth models. 

3 Appendix 



Proof of Theorem 2.1 First assume that ^(/-O G If d(fj,,v) = +oo, the inequality 



is trivially verified. If not, first remark that 

^)-^) = E((r-vo*)) 

where the expectation is meant under the true distribution of the observations, // 
Applying Cauchy-Schwarz inequality, we get 

ipQj.) - tfj(u) < v/vartT)^/^™,^). 

It is easy to see that d(fi® n , v® n ) = (d(fi, v) + 1)™ — 1, which yields 

(m - m? 



(d(u, v) + \)»-\' 

If G with q > 1, we apply the previous result to the estimator a l T G R for some 



var(T) > 

a G W. We get for all v ^ /j,: 

varfa 1 ) = a varfi ja > ——. r — — = a — — a. 

The inequality holds for all a G 1R 9 , which proves the result. 



Proof of Proposition 2.2. First remark that if {fie}eee is differentiable in L 2 (/x) at 
/i = p,Q with score g, the limit as 9 — > 9 of d(n, fig) / (9 — 9o) 2 exists and is equal to the 
Fisher information J g 2 dfi. In particular, we have for a fixed n G N, 

(d(lM, fig) + !)"-!= nd(n, fig) +o(\9-9 \). 
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Hence, 



lim Hit (a, u,e) = lini 



(VW-V>M) a (0-0o) 



Proof of Lemma 2.3 For all z/ 7^ /i, we know that H"T{[i,v) < B^°. The sequence 



{5^} n£ N is decreasing as n — > 00, thus, if n > n 



V^/i, HZ(jm,u)< 



nB;° (rf(/^) + 1)"° - 1 
n (din, v) + 1)™ - 1 ' 



Since the function x \-> ((x + l) n ° — l)/((x + l) n — 1) is decreasing on the interval (e; +00) 
as soon as n > n (e + l) n °/((e + l) no — 1), we conclude that for large enough values of n 

Ve>0, sup H$(p,v) < — — j-^L — p 
d( At)l /)> £ ^ n (e + l) n - 1 

The right term tends to as n — > 00 for all e > 0, which ends the proof. 



Proof of Theorem 2.4 The theorem is true if B^ = 0. Now, assume that B^ > 0, 
which warrants that B^P) = B^({n e }g) for all n G N. Let {/i„} ne N be a sequence of 
measures in {^e}e, suitably chosen so that lim^oo H^(a, u n ) = B™. We want to prove 
that limn^oo d(fi, /i n ) = 0. By contradiction, if there exists e > and an increasing 
sequence of integers {n k } keN such that V7c E N, d{[i, fi nk ) > e, then: 

^ fc (/i,/inJ< SUP HpM^O 



by Lemma 2.3, which conflicts with the fact that lim^oo H^ k (a, /i nfc ) = B^ > 0. So, we 
conclude that lim^oo d(fi, fi n ) = 0. Since H^(fi, .) is pointwise decreasing as n — > 00, we 
get that for all n E N, 

E~ = lim fljO*,^) < lim fl^/i,) = 

So, {/i6»}e is a least favorable path of the model and therefore satisfies B^(V) = B^({fig}g), 
yielding B^(V) < B^(V). The reverse inequality being an obvious consequence of Propo- 
, we conclude that B^(V) = B,^{V). 



sit ion 
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